Overview
Dataset statistics
| Number of variables | 5 |
|---|---|
| Number of observations | 4079 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.1 MiB |
| Average record size in memory | 275.5 B |
Variable types
| Numeric | 2 |
|---|---|
| Text | 3 |
Reproduction
| Analysis started | 2026-02-08 07:03:43.979576 |
|---|---|
| Analysis finished | 2026-02-08 07:03:55.410577 |
| Duration | 11.43 seconds |
| Software version | ydata-profiling vv4.18.1 |
| Download configuration | config.json |
Variables
ID
Real number (ℝ)
Uniform Unique
| Distinct | 4079 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2040 |
| Minimum | 1 |
|---|---|
| Maximum | 4079 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 63.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 204.9 |
| Q1 | 1020.5 |
| median | 2040 |
| Q3 | 3059.5 |
| 95-th percentile | 3875.1 |
| Maximum | 4079 |
| Range | 4078 |
| Interquartile range (IQR) | 2039 |
Descriptive statistics
| Standard deviation | 1177.6502 |
|---|---|
| Coefficient of variation (CV) | 0.57727951 |
| Kurtosis | -1.2 |
| Mean | 2040 |
| Median Absolute Deviation (MAD) | 1020 |
| Skewness | 0 |
| Sum | 8321160 |
| Variance | 1386860 |
| Monotonicity | Strictly increasing |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 1 | < 0.1% |
| 2725 | 1 | < 0.1% |
| 2712 | 1 | < 0.1% |
| 2713 | 1 | < 0.1% |
| 2714 | 1 | < 0.1% |
| 2715 | 1 | < 0.1% |
| 2716 | 1 | < 0.1% |
| 2717 | 1 | < 0.1% |
| 2718 | 1 | < 0.1% |
| 2719 | 1 | < 0.1% |
| Other values (4069) | 4069 |
| Value | Count | Frequency (%) |
| 1 | 1 | |
| 2 | 1 | |
| 3 | 1 | |
| 4 | 1 | |
| 5 | 1 | |
| 6 | 1 | |
| 7 | 1 | |
| 8 | 1 | |
| 9 | 1 | |
| 10 | 1 |
| Value | Count | Frequency (%) |
| 4079 | 1 | |
| 4078 | 1 | |
| 4077 | 1 | |
| 4076 | 1 | |
| 4075 | 1 | |
| 4074 | 1 | |
| 4073 | 1 | |
| 4072 | 1 | |
| 4071 | 1 | |
| 4070 | 1 |
Name
Text
| Distinct | 4001 |
|---|---|
| Distinct (%) | 98.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 307.8 KiB |
Length
| Max length | 33 |
|---|---|
| Median length | 27 |
| Mean length | 8.5126256 |
| Min length | 3 |
Unique
| Unique | 3936 ? |
|---|---|
| Unique (%) | 96.5% |
Sample
| 1st row | Kabul |
|---|---|
| 2nd row | Qandahar |
| 3rd row | Herat |
| 4th row | Mazar-e-Sharif |
| 5th row | Amsterdam |
| Value | Count | Frequency (%) |
| de | 81 | 1.6% |
| san | 62 | 1.2% |
| la | 25 | 0.5% |
| santa | 22 | 0.4% |
| são | 16 | 0.3% |
| del | 16 | 0.3% |
| city | 12 | 0.2% |
| do | 12 | 0.2% |
| el | 11 | 0.2% |
| saint | 10 | 0.2% |
| Other values (4319) | 4722 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 5025 | 14.5% |
| n | 2423 | 7.0% |
| i | 2245 | 6.5% |
| o | 2192 | 6.3% |
| e | 2048 | 5.9% |
| r | 1963 | 5.7% |
| u | 1482 | 4.3% |
| l | 1371 | 3.9% |
| s | 1184 | 3.4% |
| t | 1161 | 3.3% |
| Other values (81) | 13629 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 34723 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| a | 5025 | 14.5% |
| n | 2423 | 7.0% |
| i | 2245 | 6.5% |
| o | 2192 | 6.3% |
| e | 2048 | 5.9% |
| r | 1963 | 5.7% |
| u | 1482 | 4.3% |
| l | 1371 | 3.9% |
| s | 1184 | 3.4% |
| t | 1161 | 3.3% |
| Other values (81) | 13629 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 34723 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| a | 5025 | 14.5% |
| n | 2423 | 7.0% |
| i | 2245 | 6.5% |
| o | 2192 | 6.3% |
| e | 2048 | 5.9% |
| r | 1963 | 5.7% |
| u | 1482 | 4.3% |
| l | 1371 | 3.9% |
| s | 1184 | 3.4% |
| t | 1161 | 3.3% |
| Other values (81) | 13629 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 34723 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| a | 5025 | 14.5% |
| n | 2423 | 7.0% |
| i | 2245 | 6.5% |
| o | 2192 | 6.3% |
| e | 2048 | 5.9% |
| r | 1963 | 5.7% |
| u | 1482 | 4.3% |
| l | 1371 | 3.9% |
| s | 1184 | 3.4% |
| t | 1161 | 3.3% |
| Other values (81) | 13629 |
CountryCode
Text
| Distinct | 233 |
|---|---|
| Distinct (%) | 5.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 271.1 KiB |
Length
| Max length | 7 |
|---|---|
| Median length | 3 |
| Mean length | 3.0676636 |
| Min length | 3 |
Unique
| Unique | 85 ? |
|---|---|
| Unique (%) | 2.1% |
Sample
| 1st row | AFG |
|---|---|
| 2nd row | AFG |
| 3rd row | AFG |
| 4th row | AFG |
| 5th row | NLD |
| Value | Count | Frequency (%) |
| chn | 363 | 8.9% |
| ind | 302 | 7.4% |
| usa | 274 | 6.7% |
| bra | 250 | 6.1% |
| jpn | 248 | 6.1% |
| rus | 189 | 4.6% |
| mex | 171 | 4.2% |
| phl | 133 | 3.3% |
| deu | 93 | 2.3% |
| idn | 85 | 2.1% |
| Other values (223) | 1971 |
Most occurring characters
| Value | Count | Frequency (%) |
| N | 1432 | 11.4% |
| R | 1109 | 8.9% |
| A | 1093 | 8.7% |
| U | 879 | 7.0% |
| S | 684 | 5.5% |
| P | 622 | 5.0% |
| D | 608 | 4.9% |
| I | 594 | 4.7% |
| C | 591 | 4.7% |
| H | 587 | 4.7% |
| Other values (20) | 4314 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 12513 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| N | 1432 | 11.4% |
| R | 1109 | 8.9% |
| A | 1093 | 8.7% |
| U | 879 | 7.0% |
| S | 684 | 5.5% |
| P | 622 | 5.0% |
| D | 608 | 4.9% |
| I | 594 | 4.7% |
| C | 591 | 4.7% |
| H | 587 | 4.7% |
| Other values (20) | 4314 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 12513 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| N | 1432 | 11.4% |
| R | 1109 | 8.9% |
| A | 1093 | 8.7% |
| U | 879 | 7.0% |
| S | 684 | 5.5% |
| P | 622 | 5.0% |
| D | 608 | 4.9% |
| I | 594 | 4.7% |
| C | 591 | 4.7% |
| H | 587 | 4.7% |
| Other values (20) | 4314 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 12513 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| N | 1432 | 11.4% |
| R | 1109 | 8.9% |
| A | 1093 | 8.7% |
| U | 879 | 7.0% |
| S | 684 | 5.5% |
| P | 622 | 5.0% |
| D | 608 | 4.9% |
| I | 594 | 4.7% |
| C | 591 | 4.7% |
| H | 587 | 4.7% |
| Other values (20) | 4314 |
District
Text
| Distinct | 1352 |
|---|---|
| Distinct (%) | 33.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 308.5 KiB |
Length
| Max length | 20 |
|---|---|
| Median length | 17 |
| Mean length | 8.9823486 |
| Min length | 1 |
Unique
| Unique | 809 ? |
|---|---|
| Unique (%) | 19.8% |
Sample
| 1st row | Kabol |
|---|---|
| 2nd row | Qandahar |
| 3rd row | Herat |
| 4th row | Balkh |
| 5th row | Noord-Holland |
| Value | Count | Frequency (%) |
| pradesh | 96 | 1.8% |
| west | 83 | 1.6% |
| unknown | 73 | 1.4% |
| california | 73 | 1.4% |
| são | 70 | 1.3% |
| england | 70 | 1.3% |
| paulo | 69 | 1.3% |
| central | 61 | 1.2% |
| 51 | 1.0% | |
| java | 49 | 0.9% |
| Other values (1457) | 4575 |
Most occurring characters
| Value | Count | Frequency (%) |
| a | 5486 | 15.0% |
| n | 2987 | 8.2% |
| i | 2449 | 6.7% |
| o | 2308 | 6.3% |
| e | 2008 | 5.5% |
| r | 1997 | 5.5% |
| s | 1513 | 4.1% |
| t | 1376 | 3.8% |
| l | 1367 | 3.7% |
| u | 1237 | 3.4% |
| Other values (80) | 13911 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 36639 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| a | 5486 | 15.0% |
| n | 2987 | 8.2% |
| i | 2449 | 6.7% |
| o | 2308 | 6.3% |
| e | 2008 | 5.5% |
| r | 1997 | 5.5% |
| s | 1513 | 4.1% |
| t | 1376 | 3.8% |
| l | 1367 | 3.7% |
| u | 1237 | 3.4% |
| Other values (80) | 13911 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 36639 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| a | 5486 | 15.0% |
| n | 2987 | 8.2% |
| i | 2449 | 6.7% |
| o | 2308 | 6.3% |
| e | 2008 | 5.5% |
| r | 1997 | 5.5% |
| s | 1513 | 4.1% |
| t | 1376 | 3.8% |
| l | 1367 | 3.7% |
| u | 1237 | 3.4% |
| Other values (80) | 13911 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 36639 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| a | 5486 | 15.0% |
| n | 2987 | 8.2% |
| i | 2449 | 6.7% |
| o | 2308 | 6.3% |
| e | 2008 | 5.5% |
| r | 1997 | 5.5% |
| s | 1513 | 4.1% |
| t | 1376 | 3.8% |
| l | 1367 | 3.7% |
| u | 1237 | 3.4% |
| Other values (80) | 13911 |
Population
Real number (ℝ)
| Distinct | 3833 |
|---|---|
| Distinct (%) | 94.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 342663.99 |
| Minimum | 42 |
|---|---|
| Maximum | 9981619 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 63.7 KiB |
Quantile statistics
| Minimum | 42 |
|---|---|
| 5-th percentile | 91698.4 |
| Q1 | 115455 |
| median | 167051 |
| Q3 | 304288 |
| 95-th percentile | 1097146.1 |
| Maximum | 9981619 |
| Range | 9981577 |
| Interquartile range (IQR) | 188833 |
Descriptive statistics
| Standard deviation | 690721.99 |
|---|---|
| Coefficient of variation (CV) | 2.0157414 |
| Kurtosis | 81.157231 |
| Mean | 342663.99 |
| Median Absolute Deviation (MAD) | 63947 |
| Skewness | 7.9082322 |
| Sum | 1.3977264 × 109 |
| Variance | 4.7709687 × 1011 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 167051 | 73 | 1.8% |
| 90000 | 12 | 0.3% |
| 101000 | 6 | 0.1% |
| 130000 | 4 | 0.1% |
| 127000 | 4 | 0.1% |
| 103300 | 4 | 0.1% |
| 92300 | 4 | 0.1% |
| 92000 | 4 | 0.1% |
| 100000 | 4 | 0.1% |
| 140800 | 4 | 0.1% |
| Other values (3823) | 3960 |
| Value | Count | Frequency (%) |
| 42 | 1 | |
| 167 | 1 | |
| 300 | 1 | |
| 455 | 1 | |
| 503 | 1 | |
| 559 | 1 | |
| 595 | 1 | |
| 682 | 1 | |
| 700 | 1 | |
| 800 | 1 |
| Value | Count | Frequency (%) |
| 9981619 | 1 | |
| 9968485 | 1 | |
| 9696300 | 1 | |
| 9604900 | 1 | |
| 9269265 | 1 | |
| 8787958 | 1 | |
| 8591309 | 1 | |
| 8008278 | 1 | |
| 7980230 | 1 | |
| 7472000 | 1 |
Interactions
Correlations
| ID | Population | |
|---|---|---|
| ID | 1.000 | -0.031 |
| Population | -0.031 | 1.000 |
Missing values
A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
Sample
| ID | Name | CountryCode | District | Population | |
|---|---|---|---|---|---|
| 0 | 1 | Kabul | AFG | Kabol | 1780000.0 |
| 1 | 2 | Qandahar | AFG | Qandahar | 237500.0 |
| 2 | 3 | Herat | AFG | Herat | 186800.0 |
| 3 | 4 | Mazar-e-Sharif | AFG | Balkh | 127800.0 |
| 4 | 5 | Amsterdam | NLD | Noord-Holland | 731200.0 |
| 5 | 6 | Rotterdam | NLD | Zuid-Holland | 593321.0 |
| 6 | 7 | Haag | NLD | Zuid-Holland | 440900.0 |
| 7 | 8 | Utrecht | NLD | Utrecht | 234323.0 |
| 8 | 9 | Eindhoven | NLD | Noord-Brabant | 201843.0 |
| 9 | 10 | Tilburg | NLD | Noord-Brabant | 193238.0 |
| ID | Name | CountryCode | District | Population | |
|---|---|---|---|---|---|
| 4070 | 4070 | Chitungwiza | ZWE | Harare | 274912.0 |
| 4071 | 4071 | Mount Darwin | ZWE | Harare | 164362.0 |
| 4072 | 4072 | Mutare | ZWE | Manicaland | 131367.0 |
| 4073 | 4073 | Gweru | ZWE | Midlands | 128037.0 |
| 4074 | 4074 | Gaza | PSE | Gaza | 353632.0 |
| 4075 | 4075 | Khan Yunis | PSE | Khan Yunis | 123175.0 |
| 4076 | 4076 | Hebron | PSE | Hebron | 119401.0 |
| 4077 | 4077 | Jabaliya | PSE | North Gaza | 113901.0 |
| 4078 | 4078 | Nablus | PSE | Nablus | 100231.0 |
| 4079 | 4079 | Rafah | PSE | Rafah | 92020.0 |